Skip to content

Adding uint4 dtype implementation #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 10, 2024

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Nov 23, 2023

Stack from ghstack (oldest at bottom):

Summary:
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

we plan to move the uint4 tensor subclass to core after it is more mature

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Nov 23, 2023
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: fc5acbb
Pull Request resolved: #13
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 23, 2023
uint8_data = uint8_data.contiguous().view(-1)
return (uint8_data[::2] << 4 | uint8_data[1::2]).view(down_size(shape))

class UInt4Tensor(torch.Tensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're creating a whole new Tensor here, but really we want to extend a QTensor with a new lower precision backend? Or do we want to combine UInt4Tensor with a QTensor, where QTensor has a UInt4Tensor for storing int4 weights (next to scales etc.)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by QTensor do you mean https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/quantized/QTensorImpl.h? I think we are moving away from this and just rely on PyTorch native dtypes now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpuhrsch I guess by QTensor you mean these tensors: https://github.com/pytorch-labs/ao/blob/main/torchao/quantization/subclass.py#L178, This UInt4Tensor can compose with that (it's similar to uint8 tensor), see the example below (PerChannelSymmetricWeightUInt4Tensor)

@bdhirsh
Copy link
Contributor

bdhirsh commented Nov 29, 2023

The next major issue I think you'll hit is that with torch.compile, we don't have a recipe today for "send subclass directly to the compiler, instead of desugaring it". Most of the pieces for this to work should already exist, but let me know and once the other errors are fixed we can try to sit down together and get this to work.

@jerryzh168
Copy link
Contributor Author

The next major issue I think you'll hit is that with torch.compile, we don't have a recipe today for "send subclass directly to the compiler, instead of desugaring it". Most of the pieces for this to work should already exist, but let me know and once the other errors are fixed we can try to sit down together and get this to work.

oh the suggestion from Alban is that we'll desugar it for now, since triton don't have int4 support anyways and we are using handwritten custom kernels, the int4 is just for representation in frontend

Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Nov 29, 2023
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 1a20256
Pull Request resolved: #13
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Nov 29, 2023
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: d9c853f
Pull Request resolved: #13
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 7, 2023
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: aa80ed3
Pull Request resolved: #13
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 13, 2023
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 6647651
Pull Request resolved: #13
@iseeyuan
Copy link
Contributor

Are you planning to add int4 dtype from ATen C++ side?

@jerryzh168
Copy link
Contributor Author

Are you planning to add int4 dtype from ATen C++ side?

no we don't, we'll use bits8 instead, and encode int4 related information in ops that is using these tensors, e.g. dequantize_int4(bits8_tensor, scale, zero_point, qmin, qmax, ...)

Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 18, 2023
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 812be7a
Pull Request resolved: #13
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 18, 2023
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 05f1c94
Pull Request resolved: #13
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 19, 2023
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 9a07ab7
Pull Request resolved: #13
jerryzh168 added a commit to pytorch/pytorch that referenced this pull request Jan 11, 2024
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype 

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit to pytorch/pytorch that referenced this pull request Jan 11, 2024
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype 

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit to pytorch/pytorch that referenced this pull request Jan 12, 2024
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype 

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit to pytorch/pytorch that referenced this pull request Jan 12, 2024
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype 

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit to pytorch/pytorch that referenced this pull request Jan 12, 2024
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 0277ba4
Pull Request resolved: #117208
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Jan 13, 2024
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: #117208
Approved by: https://github.com/ezyang
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 16, 2024
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: be18a64
Pull Request resolved: #13
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 17, 2024
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 888f289
Pull Request resolved: #13
zero_point: int,
) -> torch.Tensor:
inv_scale = 1.0 / scale
return pack_uint4(torch.clamp(torch.round(input * inv_scale) + zero_point, 0, 15).to(torch.uint8))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for inductor and export, how do you want these routines to be optimized? Is there a custom kernel for the quantize/dequantizing? Or would you prefer to for inductor to "see" this decomposition, so it can try to fuse it into a triton kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be the latter for now, but maybe in the future we can have custom kernels

self, size = args
size = utils.infer_size(size, self.numel())
assert not kwargs
# WARNING: views not preserved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't we preserving the view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is copied from Edward's original PR, not exactly sure why, feels like this is creating a new tensor so we don't have view relationship? is view supposed to share storage?

Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 22, 2024
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 4b6082c
Pull Request resolved: #13
@jerryzh168 jerryzh168 requested a review from bdhirsh January 22, 2024 18:06
Copy link
Contributor

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Summary:
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

we plan to move the uint4 tensor subclass to core after it is more mature

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Feb 10, 2024
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 51cf717
Pull Request resolved: #13
@jerryzh168 jerryzh168 merged commit 8ab52a7 into gh/jerryzh168/1/base Feb 10, 2024
jerryzh168 added a commit that referenced this pull request Feb 10, 2024
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 51cf717
Pull Request resolved: #13
@jerryzh168 jerryzh168 deleted the gh/jerryzh168/1/head branch February 10, 2024 05:50
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
Summary:
We have a lot of interest for int4 dtypes, and we'd like to add the dtype out of PyTorch core.
This PR added some preliminary support for uint4 through tensor subclass and we'll continue to iterate on this

Test Plan:
python test/dtypes/test_int4.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 51cf717
Pull Request resolved: pytorch#13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants